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ABSTRACT 

In  this  paper  we  show  that  the  minimum  number  of  comparisons 
necessary  for  the  computation  of  the  k*"  element  of  a  totally  ordered 
set  of  size  n,V.  (n),  is  lower  hounded  by  n-k+(k-l)[Co^  (r-rr)] .  For 
3  <  k  <  r,  this  bound  improves  the  best  lower  bound  presently  known.  A 
new  algorithm  which  yields  an  upper  bound  that  is  better  than  the  currently 
known  bound  for  a  large  range  of  values  of  n  will  also  be  presented. 


1.  Introduction 

The  selection  problem  is  to  determine  the  k^*1  element  of  a 
totally  ordered  set  P  of  size  n.   Two  efficient  algorithms  for  solving 
this  problem  are  presently  known.  When  k  is  small  with  respect  to 
n,  3  <  k  <  j-  -  1 ,  A.  Hadian  and  M.  Sobel's  algorithm  ([3])  which  needs 
at  most  n-k+(k-l)j"^(n-k+2)]   comparisons  is  adequate.  Another  method, 
using  at  most  5.73  n  comparisons, was  discovered  by  M.  Blum  et  al.  ([1]). 
This  method  is  more  efficient  than  Hadian  and  Sobel's  method  for 
k  >  kn/egn  . 

Let  V,  (n)  denote  the  minimum  number  of  comparisons  necessary 
for  finding  the  k   element  of  a  set  of  size  n.   The  exact  values  of  V,  (n) 
are  known  for  k  =  1  (V  (n)  =  n-l),  and  k  =  2  (V  (n)  =  n-2+f^.nl )  (Schreier 
and  Kislitsyn).   For  k  =  3>  F.  Yao  ([6])  has  obtained  a  lower  bound  which 
is  equal  to  the  upper  bound  of  Hadian  and  Sobel  for  infinitely  many  values 
of  n.  V.  Pratt  and  F.  Yao  ([5])  also  showed  that: 

for  k<^n/2%n,     \^n^  -  n"k+(k-l)  [^n-(k-l)%*n  -  28$  ( (k-l)!  )  ], 

for  k  <  |,  Vk(n)  -  n+2k_^n> 

for  n/3  <  k  <  [n-3/2j,  V  (n)  >  (3n+k)/2-^n-0(l) . 

improving  the  bound  due  to  Blum  et  al,  except  when  2^n/22g2g.n   <  k  <  $$n. 

In  this  paper  we  first  present  a  new  lower  bound  for  V  (n), 
namely:  n-k+(k-l)["^(j~-)]  <  Vfe(n).  When  3  <  k  <  r-,  this  bound  is 
strictly  greater  than  the  best  previously  known  bound.   For  instance, 
for  k  =  5,  this  result  together  with  the  best  known  upper  bound  enables 
us  to  determine  the  value  of  V  (n)  within  a  gap  of  at  most  8,  while  the 
previously  known  bounds  leave  a  gap  of  at  least  80.   Furthermore,  this 

_ 
^  stands  for  %p. 


result  shows  that,  for  a  fixed  vaiue  of  k,  the  tree  selection  algorithm  is 
asymptotically  optimal. 

We  then  present  a  new  algorithm  for  selecting  the  k^*1  largest 
element  of  a  set  of  size  n  which  yields  an  upper  bound  that  improves 


strictly  the  previously  known  bound  when  2^n  <  k  <  kn/ 2g.n   and  when 

fk-1  .  1  n 
■r—r  1  -1 

2i+k-2  <  n<2         for  all  integer  i.   Specifically,  for  k  =  3,  the 

o 
new  upper  bound  is  V,(n)  <  n-3+[%  (n-l)]+[  8$  (n-2    )]. 

Following  the  original  formulation  of  the  selection  problem  by 

Lewis  Caroll  ([2]),  we  call  an  element  of  the  set  P  a  "player"  and  a 

comparison  between  two  players  a  "match"  which  must  be  won  by  one  of  the 

two  players.  A  procedure  for  selecting  the  k^h  largest  element  will  be 

referred  to  as  a  "tournament"  for  determining  the  k^h  best  player. 

2.   The  Lower  Bound 

We  want  to  show  that,  for  any  algorithm  that  computes  the  k^^1 
best  player  among  n  players,  there  exists  a  ranking  of  the  players  such 
that  this  algorithm  must  perform  at  least  n-k+(k-l)[^  r— r- 1  comparisons. 
The  idea  of  using  an  oracle  in  our  proof  is  due  to  Knuth  ([4])  who  gave  a 
new  proof  of  Kislitsyn's  lower  bound  for  k  =  2.   Here  we  extend  this  idea 
for  all  values  of  k.  An  oracle  is  basically  a  deterministic  process 
which  builds  up  a  ranking  among  the  players,  while  the  algorithm  tries 
to  find  out  the  solution.   This  ranking,  which  must  satisfy  transitivity 
and  antisymmetry,  will  force  the  algorithm  to  perform  at  least  V,  (n) 
comparisons.  A  correct  algorithm  cannot  stop  before  the  k^*1  player  is 
uniquely  determined  by  the  oracle  (how  can  the  algorithm  know  the  answer 
if  the  oracle  still  has  some  freedom  to  choose  it?).  As  a  direct 


consequence,  the  set  of  the  k-1  best  players  must  also  be  uniquely 
determined. 

We  describe  the  oracle  0  as  an  automaton  whose  states  are 
represented  by  ordered  pairs.   To  be  specific,  the  state  vector  S  before 
the  t   match  is  (cp,,  E  )  where  cp  is  a  mapping  from  P  to  N,  and  E,  is  a 
totally  ordered  subset  of  P.   The  initial  state  is  S  -    (i, 0)  where  I  is 
the  constant  mapping  such  that  yxeP, l(x)  =  1.  Roughly  speaking,  the 
players  in  E  are  the  top  players,  specifically  the  i   player  to  enter 
the  set  E^.  is  the  i""1  best  player.   Candidates  for  entering  E^.  are  selected 
according  to  the  values  of  cp,  . 

The  input  to  the  oracle  at  time  t  is  an  unordered  pair  of  players 
(x,y},  who  are  engaged  in  the  t   match  according  to  the  selection  procedure. 
The  oracle  decides  the  winner  of  the  match  and  enters  state  S    according 
to  the  following  rules: 


Rl 


-  If  x  e  E  and  y  e  E  ,  then  x  wins  if  and  only  if  x  >  y 

(E,  is  an  ordered  set).  Moreover,  S,  .  :=  S,  . 
t  '   t+1     t 


R2  -  If  x  e  E  and  y  ^  E  ,  then  x  wins  and  S    :=  S  . 

R3  -  If  x  X  E  and  y  X  E,,  then  if  cp,  (x)  >  cp,  (y)  x  wins;  and  if 

cp,  (x)  =  cp,  (y)  an  arbitrary  decision  compatible  with  transitivity 

will  be  made.   In  both  cases,  if  cp,  (x)  +  cp,  (y)  >  r—r  then  cp,  ,  :=  cp,, 

E,    :=  E  U{x}  and  x  becomes  the  smallest  element  of  E,   . 

If  cpt(x)  +  cpt(y)  <  ~-  then  Et+1  :=  Et,  cpt+]_(y)  :=  o,  cpt+1(x)  :=  cpt(x)  +cpt(y) 

and  yz  /  x,y  cpt+1(z)  :=  cpt(z). 


Being  given  that  x  domirates  only  x  at  time  1,  we  say  that  x 
dominates  y  at  time  t+1,  if  x  dominates  y  at  time  t,  or  if  x  has  beaten  y 
in  the  t  ^  match,  or  if  x  dominates  z  and  z  dominates  y.   Clearly,  if  x 
dominates  y,  x  is  a  better  player  than  y. 
Theorem:  the  number  V  (n)  satisfies:  n-k+(k-l)  [^(r- r-)]  <  V  (n). 

We  first  prove  the  following  lemma: 

Lemma:  Using  oracle  0  the  k-1  best  players  will  have  played  at  least 

(k-l)p^j(r— r)]  matches  when  the  tournament  is  completed. 

Proof:   The  lemma  follows  from  the  facts  listed  below: 

Fact  1:   The  number  of  matches  won  by  x  by  time  t  is  greater  or  equal 

to  pWt(x)]  . 

Fact  2:   Let  e.  e  E  be  the  itn  player  (l  <  i  <  |E  | )  to  enter  E  .   Then 

e.  can  be  dominated  only  by  e.  with  j  <  i. 

Fact  3'   £  cp,  (x)  =  n. 
xeP  t 

We  call  W,  the  set  of  players  x  such  that  x  ^  E  and  cp,  (x)  >  o. 
Fact  k:      |E  |  +  |W  |  >  k-1. 

This  is  a  consequence  of  Fact  3  and  from  the  fact  that:  Vx  e  P  cp,  (x)  <  r—rr   . 
Fact  3:  At  the  end  of  the  tournament  |E  |  >  k-1. 

Since  the  players  in  W,  can  be  dominated  only  by  the  players  in  E  ,  if  |E  |  <  k-1, 
then  any  player  in  E,  or  W,  can  be  one  of  the  k-1  best  players.   Contradiction 
results  from  Fact  k. 

Fact  6:  At  the  end  of  the  tournament  the  k-1  best  players  are  the  k-1  top 
players  in  E  . 
This  is  a  consequence  of  Facts  2  and  5. 

Since,  when  x  enters  E,   by  defeating  y,cp,  (x)  +  cp,  (y)  >  r~ 
and  cp,  (x)  >  cp,  (y),  the  result  is  a  direct  consequence  of  Facts  1  and  6.  □ 


Proof  of  theorem;  According  to  the  lemma,  the  k-1  best  players  have 
played  at  least  (k-1 )  [&?(:--=-)]  matches.   Clearly,  any  player  who  is  not 
among  the  k  best  players  have  lost  at  least  one  match  against  a  player 
which  is  not  among  the  k-1  best.   Thus,  there  are  n-k  additional  matches 
which  were  not  included  in  the  count  of  the  matches  played  by  the  k-1 
top  players.   This  completes  the  proof  of  the  theorem. 

3.   Improving  the  Upper  Bound 

Since  Hadian  and  Sobel's  algorithm  needs  at  most  n-k+(k-l)[^(n-k+2)] 
comparisons,  the  new  lower  bound  presented  above  enables  us  to  determine 
V,  (n)  to  within  a  gap  of  at  most  (k-l)[^(k-l)]  comparisons.   The  new 
algorithm  we  present  reduces  that  gap  when  8g.  n  <   k/2  and  when 


2X+k-2  <  n  g  21  +  2 


k-2  . 
k^l1 


-1 

,  for  any  integer  i 


We  describe  the  algorithm  in  a  pseudo-ALGOL  dialect  including  set 
operations  ( U,  f\~)   and  list  operations  (first,  last,  ", "  for  concatenation). 

We  first  describe  the  procedure  BEST(i,S)  which  is  a  tree  selection 
algorithm  used  to  determine  the  ordered  list  of  the  i  best  players  of  the 
set  S.   The  set  S  is  initially  divided  into  two  disjoint  subsets  S  and 
S2,  such  that  S=S  US  and  |S  |  =  2i^!S'1~1.   Furthermore,  each  set  is 
associated  with  a  list  TOP(S)  which  is  initially  empty. 

list  procedure  WINNER  (list  L  ,  list  L  )  :=  if  last  (L  )  >  last  (L  ) 

then  L  else  L  ; 

comment ;  WIENER  uses  one  comparison  except  if  one  of  the  two 

lists  is  empty; 
list  procedure  BEST  (integer  i,  set  S); 
begin  if  S  ^  0  then 


begin  for  j  =  |T0P  (S)|+l  until  i  do 

begin  W;=WIMER  (BEST  (l,  S^),   BEST  (l,  S  )); 
if  T0P(S  )  =  W  then 
begin  T0P(S-L)  :=#;  S-r-S.-W;  end; 

else 
begin  TQP(S2):=0;  S2:=S2-W;  end; 
T0P(S):=T0P(S),W; 
end; 
end; 
T0P(S); 
end. 

This  tree  selection  algorithm  performs  at  most  |S|  -i+(i-l)f"^|S  |  ]  comparisons 
(see  for  instance  [k]    for  further  details).   The  new  algorithm  is  an 
extension  of  this  tree  selection  algorithm.   Let  P  be  the  initial  set  of 
players  which  is  divided  into  two  disjoint  subsets  P,  and  Pp  such  that 
PUP  =  P  and  |P  |  =  2'^IP'  '"  .   The  procedure  BEST  applied  to  P  selects 
top  players  one  by  one  in  P,  and  Pp.   The  new  algorithm  uses  two  sequences 
of  positive  integers  fu  }  and  (v  }  and  a  characteristic  step  is  to  select 
either  the  u,  top  players  of  P,  or  the  v.  top  players  of  Pp,  according  to 
the  results  of  previous  comparisons. 

list  procedure  SELECT  (integer  k,  set  P); 
begin  h:=l;  j:=l;  A:»u-+v_; 

while  A  g  k  do 
LI:   begin  W:=WIWWER  (BEST(u^,P  ),  BEST  (v.,  P  )); 
if  T0P(P  )  =  W  then 
begin  T0P  (p  ) :=0;  P  :=P  -W; 
h:=h+l;  A:=A+u,; 


end; 

else 
begin  T0P  (P  ):=0;  P  :=P  -W; 


j 


end; 
end; 


R:=k  -  A  +  u.  +  v.; 


3' 

T0P(P):=  T0P(P),  PICK  (BEST(R,P1),  BEST(R,P2)); 

Comment :   TOP(p)  contains  the  k  best  players  of  P,  furthermore 
the  kth  element  of  TOP(P)  is  the  kth  player  of  P; 


end. 


list  procedure  PICK  (list  L  ,  list  L ) 

Comment:   selects  the  top  R  players  from  the  ordered  lists 

L  and  L  of  length  R  using  R  comparisons; 
Remark:   It  is  possible  (and  sometimes  more  efficient)  to 
use  the  procedure  SELECT  recursively  instead  of  the  procedure 
BEST.   In  that  case,  since  the  result  of  SELECT  is  not  an  ordered 
list,  it  is  also  necessary  to  replace  the  procedure  PICK. 

Analysis  of  the  Algorithm 

An  exhaustive  analysis  of  the  algorithm,  to  determine  the  best 
possible  choices  of  {u  }  and  (v  )  for  given  values  of  n  and  k,  being  quite 
tedious,  we  restrict  our  study  to  particular  values  of  {u  }  and  {v  } . 

A  comparison  performed  when  line  LI  of  the  algorithm  is  executed, 
or  a  comparison  performed  in  the  procedure  PICK,  clearly  determines  at 
least  one  new  element  of  the  pool  of  the  k  best  players.   Such  a  comparison 
will  be  referred  to  as  an  active  comparison. 


8 


Case  1  u  =v  =a,  ae  N,  for  all  integer  a. 

a  a 

Assuming  that  k  =  ta,  te  N,  a+t-1  active  comparisons  are 
performed  and  clearly  at  most  n-2  +  (k+a-2)  (|"^n-2l)  inactive  ones.   So 
that  the  difference  between  the  number  of  comparisons  performed  by  tree 
selection  and  the  number  of  comparisons  performed  by  this  algorithm  is 
clearly  equal  to: 

k  -  [(a-l)([^nl-l)+|]  . 

The  choice  a=2  shows  that  this  algorithm  strictly  improves  on  tree 

selection  if  k  >  2  (f^n]-l).   In  fact,  there  is  an  optimal  manner  of  choosing 

a  which  is  the  closest  integer  to   /   k     . 

V  R?n]  -1 

For  instance,  suppose  we  are  to  select  the  90   player  among  a 
set  of  20kS.      The  choice  u  =v  =3,  for  all  a,  in  our  algorithm  will  save  39 
comparisons  over  tree  selection. 
Case  2 

We  want  to  choose  fu  }  and  fv  }  such  that,  in  the  worst  case,  the 

number  of  inactive  comparisons  is  equal  to  n-2k  +  (k-l)  [%n] .   Such  a  choice 

guarantees  that  the  algorithm  is  not  worse  than  tree  selection. 

il    i2 
Assume  that  n=2   +  2   with  i.  >  i„.   The  values  of  fu  1  must 

12  a 

satisfy  the  relation: 

n-2-(l  -  Z  u  )(i  -1)  +  (k-l-  Z  u  )(i  -l)gn-2k  +  (k-l)(i  +l); 
i^c^h  a  igc^h  a  x 

il"i2 
that  is:  u,  g  1+- — —  (k-l-   Z   u  ). 

1       l^o^h-1 
The  choice  v  =1  for  all  integer  a  appears  to  be  always  convenient,  and  a 
simple  calculation  yields  that  the  algorithm  improves  strictly  on  tree 
selection  if: 


i2< 


(l<-2)i1  +  1 


For  instance,  for  k=7,  using  the  sequence  VL^k,    ug=2,  u,=l,  saves  3 
comparisons  on  tree  selection  if 

IL.+1 


2   <  n  ^  2   +2 


For  k=3,  using  u,=2  and  u  =1  saves  one  comparison  on  tree  selection  if 


i        i 
2   <  n  g  2   +2 


V1 


-1 


and  the  new  upper  bound  for  V  (n)  is 


&n. 


V  (n)  ^  n-3  +  [%(n-l)l+|"^(n-2L  2j)]. 
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